MCN: Modulated Convolutional Network
43
g=1
g=10
h=1
h=2
h=20
Input feature maps
10×4×32×32
Output feature maps
20×4×30×30
Reconstructed filters
20×10ˈ4×4×32×32
sum
10
20
Groups
20
10
FIGURE 3.3
MCNs Convolution (MCconv) with multiple feature maps. There are 10 and 20 feature
maps in the input and the output, respectively. The reconstructed filters are divided into
20 groups, and each group contains 10 reconstructed filters, corresponding to the number
of feature maps and MC feature maps, respectively.
map, h = 1, i = 1, ..., 10, g = 1, ..., 10, and for the second output feature map, h = 2, i =
11, ..., 20, g = 1, ..., 10.
When the first convolutional layer is considered, the input size of the network is
32 × 32 2. First, each image channel is copied K = 4 times, resulting in the new input
of size 4 × 32 × 32 to the entire network.
It should be noted that the number of input and output channels in every feature map
is the same, so MCNs can be easily implemented by simply replicating the same MCconv
module at each layer.
3.4.2
Loss Function of MCNs
To constrain CNNs to have binarized weights, we introduce a new loss function in MCNs.
Two aspects are considered: unbinarized convolutional filters are reconstructed based on
binarized filters; the intra-class compactness is incorporated based on output features. We
further introduce the variables used in this section: Cl
i are unbinary filters of the lth con-
volutional layer, l ∈{1, ..., N}; ˆCl
i denote binarized filters corresponding to Cl
i; M l denotes
the modulation filter (M-Filter) shared by all Cl
i in the lth convolutional layer and M l
j
represents the jth plane of M l; ◦is a new plane-based operation (Eq. 3.12) which is defined
in the next section. We then have the first part of the loss function for minimization:
LM = θ
2
i,l
∥Cl
i −ˆCl
i ◦M l∥2+
λ
2
m
∥fm( ˆC, ⃗M) −f( ˆC, ⃗M)∥2,
(3.18)
2We only use one channel of gray-level images (3 × 32 × 32)